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(54)^8: TABLE-BASED COMPRESSION WITH EMBEDDED CODING 
(57) Abstract 

An image compression system includes a vectorizer and a 
hierarchical vector quantization table that outputs embedded code. 
The vectorizer converts an image into image vectors representing 
respective blocks of image pixels. The table provides computatiOT- 
free transformation and compression of the image vectors. Table 
design can be divided mto codebook design and fill-m procedures 
for each stage. Codebook design for the preliminary stages 
uses a splitting generalized Lloyd algorithm (LBG/GLA) usmg a 
perceptually weighted distortion measure. Codebook design for 
the final stage uses a greedily-grown and thcai entropy-pruned tree- 
structure variation of GLA wife an entropy-constrained distortion 
measure. Table fill-in for all stages uses an unweighted proximity 
measure for assigning inputs to codebook vectors. TYansfonnatirais 
and compression are fast because they^are computation free. The 
hierarchical, multi-stage, character of the table aUow it to operate 
with low memory requirements. The embedded output allows 
convenient scalability suitable for collaborative video applications 
over heterogeneous netwoiks. 
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TABLE-BASED COMPRESSION WITH EMBEDDED CODING 

BACKGROUND OF THE INVENTION 

The present invention relates to data processing and, more particularly, to data 
compression, for example as applied to still and video images, speech and music. A major 
objective of the present invention is to enhance collaborative video applications over 
heterogeneous networks of inexpensive general purpose computers. 

As computers are becoming vehicles of human interaction, the demand is rising for the 
interaction to be more immediate and complete. Where text-based e-mail and database services 
predominated on local networks and on the Internet, the effort is on to provide such data intensive 
services such as collaborative video applications, e.g., video conferencing and interactive video. 

In most cases, the raw data requirements for such applications far exceed available 
bandwidtii, so data compression is necessary to meet the demand. Effectiveness is a goal of any 
image compi«ssion scheme. Speed is a requirement imposed by collaborative applications to 
provide ah immediacy to interaction. Scalability is a requirement imposed by the heterogeneity of 
networks and computers. 

Effectiveness can be measured in terms of the amount of distortion resulting for a given 
degree of compression. The distortion can be expressed in terms of the square of tiie difference 
between corresponding pixels averaged over the image, i.e., mean square error (less is better). 
The mean square error can be: 1) weighted, for example, to take variations in perceptual 
sensitivity into account; or 2) unweighted. 

The extent of compression can be measured either as a compression ratio or a bit rate. The 
compression ratio (more is better) is the number of bits of an input value divided by tiie number of 
bits in the expression of tiiat value in the compressed code (averaged over a large number of input 
values if the code is variable length). TTie bit rate is the number of bits of compressed code 
required to represent an input value. Compression effectiveness can be characterized by a plot of 
distortion as a function,of .bit rate. 

Ideally, tiiere would be zero distortion, and there are lossless compression techniques that 
achieve this. However, lossless compression techniques tend to be limited to compression ratios 
of about 2, whereas compression ratios of 20 to 5(K) are desired for collaborative video 
applications. Lossy compression techniques always result in some distortion. However, the 
distortion can be acceptable, even imperceptible, while much greater compression is achieved. 
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Collaborative video is desired for communication between general purpose computers over 
heterogeneous networks, including analog phone lines, digital phone lines, and local-area 
networks. Encoding and decoding are often computationally intensive and thus can introduce 
latencies or bottlenecks in the data stream. Often dedicated hardware is required to accelerate 
5 encoding and decoding. However, requiring dedicated hardware greatly reduces the market for 
collaborative video applications. For collaborative video, fast, software-based compression 
would be highly desirable. 

Heterogeneous networks of general purpose computers present a wide range of channel 
capacities and decoding capabilities. One approach would be to compress image data more than 
10 once and to different degrees for the different chaimels and computers. However, this is 
burdensome on the encoding end and provides no flexibility for different computing power on the 
receiving end. A better solution is to compress image data into a low-compression/low distortion 
code that is readily scalable to greater compression at the expense of greater distortion. 

State-of-the-art compression schemes have been promulgated as standards by an 
15 international Motion Picture Experts Group; the current standards are MPEG-1 and MPEG-2. 
These standards are well suited for applications involving playback of video encoded off-line. For 
example, they are well suited to playback of CD-ROM and DVD disks. However, compression 
effectiveness is non-optimal, encoding requirements are excessive, and scalability is too limited. 
These limitations can be better understood with the following explanation. 

20 Most compression schemes operate on digital images that are expressed as a two- 

dimensional array of picture elements (pixels) each with one (as in a monochrome or gray-scale 
image) or more (as in a color image) values assigned to each pixel. Commonly, a color image is 
treated as a superposition of three independent monochrome images for purposes of compression. 

The lossy compression techniques practically required for video compression generally 
25 involve quantization applied to monochrome (gray-scale or color component) images. In 
quantization, a high-precision image description is converted to a low-precision image description, 
typically through a many-to-one mapping. Quantization techniques can be divided into scalar 
quantization (SQ) techniques and vector quantization (VQ) techruques. While scalars can be 
considered one-dimensional vectors, there are important qualitative distinctions between the two 
30 quantization techniques. 

Vector quantization can be used to process an image in blocks, which are represented as 
vectors in an n-dimensional space. In most monochrome photographic images, adjacent pixels are 
likely to be close in intensity. Vector quantization can take advantage of this fact by assigning 
more representative vectors to regions of the n-dimensional space in which adjacent pixels are 
35 close in intensity than to regions of the n-dimensional space in which adjacent pixels are very 
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different in intensity. In a comparable scalar quantization scheme, each pixel would be 
compressed independently; no advantage is taken of the correlations between adjacent pixels. 
While, scalar quantization techniques can be modified at the expense of additional computations to 
take advantage of correlations, comparable modifications can be applied to vector quantization. 
5 Overall, vector quantization provides for more effective compression than does scalar 
quantization. 

Another difference between vector and scalar quantization is how the representative values 
or vectors are represented in the compressed data. In scalar quantization, the compressed data can 
include reduced precision expressions of the representative values. Such a representation can be 
1 0 readily scaled simply by removing one or more least-significant bits from the representative value. 
In more sophisticated scalar quantization techniques, the representative values are represented by 
indices; however, scaling can still take advantage of the fact that the representative values have a 
given order in a metric dimension. In vector quantization, representative vectors are distributed in 
an n-dimensional space. Where n>l, there is no natural order to the representative vectors. 
1 5 Accordingly, tiiey are assigned effectively arbitrary indices. There is no simple and effective way 
to manipulate these indices to make the compression scalable. 

The final distinction between vector and scalar quantization is more quantitative than 
qualitative. The computations required for quantization scale dramatically (more than linearly) 
with the number of pixels involved in a computation. In scalar quantization, one pixel is 
20 processed at a time. In vector quantization, plural pixels are processed at once. In the case, of 
popular 4x4 and 8x8 block sizes, the number of pixels processed at once becomes 16 and 64, 
respectively. To achieve minimal distortion, "full-search" vector quantization computes the 
distances in an n-dimensional space of an image vector from each representative vector 
Accordingly, vector quantization tends to be much slower than scalar quantization and, therefore, 
^ 25 limited to off-line compression applications. 

Because of its greater effectiveness, considerable effort has been directed to accelerating 
vector quantization by elinninating some of the computations required. There are structured 
altemati ves to **full-search" VQ that reduce tiie number of computations required per input block at 
the expense of a small increase in distortion. Structured VQ techniques perform comparisons in 
30 an ordered manner so as to exclude apparentiy unnecessary comparisons. All such techniques 
involve some risk that the closest comparison will not be found. However, the risk is not large 
and the consequence typicaUy is that a second closest point is selected when the first closest point 
is not. While the net distortion is larger than with full search VQ, it is typically better than scalar 
VQ performed on each dimension separately. 

35 In "tree-structured" VQ, comparisons are performed in pairs. For example, the first two 

measurements can involve codebook points in symmetrical positions in the upper and the lower 
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halves of a vector space. If an image input vector is closer to the upper codebook point, no further 
comparisons with codebook points in the lower half of the space are performed. Tree-structured 
VQ works best when the codebook has certain symmetries. However, requiring these symmetries 
reduces the flexibiUty of codebook design so that the resulting codebook is not optima! for 
5 minimizing distortion. Furthermore, while reduced, the computations required by tree-stmctured 
VQ can be excessive for collaborative video j?)plications. 

In table-based vector quantization (TBVQ), the assignment of all possible blocks to 
codebook vectors is pie-computed and represented in a lookup table. No computations are 
required during image compression. However, in the case of 4x4 blocks of pixels, with eight-bits 
0 allotted to characterize each pbtel, the number of table addresses would be 256'*, which is clearly 
impractical. Hierarchical table-based vector quantization (HTBVQ) separates a vector 
quantization table into stages; this effectively reduces the memory requirements, but at a cost of 
additional distortion. 

Further, it is well known that the pixel space in which images are originally expressed is 
5 often not the best for vector quantization. Vector quantization is most effective when the 

dimensions differ in perceptual significance. However, in pixel space, the perceptual significance 

of the dimensions (which merely represent different pixel positions in a block) does not vary. 

Accordingly, vector quantization is typically preceded by a transform such as a wavelet transfonn. 

Thus, the value of eUminating computations during vector quantization is impaired if conq)utations 
>0 are r^uired for transfonnation prior to quantization. While some work has been done integrating a 

wavelet transform into a HTBVQ table, the resulting effectiveness has not been satisfactory. 

It is recognized that hardware accelerators can be used to improve the encoding rate of data 
compression systems. However, this solution is expensive. More importantly, it is awkward 
from a distribution standpoint. On the Internet, images and Web Pages are presented in many 

25 different fonnats, each requiring their own viewer or "browser". To reach the largest possible 
audience without relying on a lowest common denominator viewing technology, image providers 
can download viewing applications to prospective consumers. Obviously, this download 
distribution system would not be applicable for hardware based encoders. If encoders for 
collaborative video are to be downloadable, they must be fast enough for real-time operation in 

30 software implementations. Where the applications involve coUaborative video over heterogeneous 
networks of general purpose cornputers, there is stUl a need for a downloadable compression 
scheme that provides a more optimal combination of effectiveness, speed, and scalability. 

SUMMARY OF THE INVENTION 

The present invention provides for data compression using a hierarchical table 
35 implementing a block transform and outputting a variable-rate, embedded code. There are several 
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aspects of the invention that are brought together to achieve optimal benefits, but which can be 
used separately. 

A counterintuitive aspect of the present invention is the incorporation of a codebook of a 
type used for structured vector quantization in a compression table. Structured vector quantization 
5 is designed to reduce the computations required for compression while accepting a small increase 
in distortion relative to full-search vector quantization. However, this tradeoff is a poor one in the 
context of tables, since all the computations are pre-computed. 

In the present case, a codebook design procedure used for tree-structured vector 
quantization is used, not to reduce computations, but to provide a codebook that can be mapped 
1 0 readily to an embedded code. In an embedded code, bits are arranged in order of significance. 
When the least significant bit of a multi-bit index to a first codebook vector is dropped, the result 
is an index of a codebook vector near the first codebook vector. Thus, an embedded code is 
readily scaled to provide a variable-rate system. 

An embedded code can readily be nude variable length to minimize entropy and reduce the 
1 5 bit rate for a net gain in compression effectiveness. Thus, any loss of effectiveness resulting from 
the use of a stmctured vector quantization codebook is at least partially offset by the gain in 
compression effectiveness resulting from the use of a variable-length code. 

Another aspect of the invention is the implementation of block transforms in the table. 
Block transforms can express data so that information can be separated by significance. This 
20 makes it feasible to apply more compression to less significant data for a net gain in the apparent 
effectiveness of the compression. 

In the case of image or other sensory data compression, if the space to which the data is 
transformed is not perceptually linear, a perceptually weighted proximity measure can be used 
during codebook design. In accordance with the present invention, an unweighted or less 
25 perceptually weighted proximity measure should be used during a table fill-in procedure to 
minimize distortion. 

A further aspect -of the invention is the incorporation of considerations other than 
perceptually weighted or unweighted proximity measures in codebook design. For example, 
entropy constraints can be imposed on codebook design to enhance bit rate. In the (greedy) 
30 growing of a decision tree, a joint entropy and distortion measure can be used to select nodes to be 
grown or praned. If the joint measure is applied on a node-by-node basis, virtually continuous 
scalability can be provided while maintaining high compression effectiveness at each available bit 
rate. 
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A final aspect of the invention takes advantage of the lower memory requirements afforded 
by hierarchical tables. Hierarchical tables raise the issue of how to incorporate structures, 
constraints, and transforms in a table. In the case of the block transforms, the transforms are used 
in codebook design at every stage of the table. However, in the case of structures and constraints 
5 used to provide variable-length codes, these are best restricted to design of the last-stage table 
only. 

It is not necessary for all aspects of the invention to be practiced together to attain 
advantages. However, when combined to yield a table-based data compression system with a 
variable-rate embedded code, the result is optimally suited for collaborative video applications. 

1 0 Scalability at both the encoding and decoding ends is provided by the embedded code. Speed is 
provided by the use of tables in which everything is pre-computed; by using the hierarchical 
tables, memory requirements can be made reasonable. Compression effectiveness is enhanced by 
incorporated block transforms and entropy considerations into codebook design. Thus, the 
compression is suitable for software only applications; thus, the compression scheme can be 

1 5 distributed over networks to make collaborative video applications widely avaUable. These and 
otiier features and advantages of the invention are apparent from the description below with 
reference to the following drawing. 



BRIEF DESCRIPTION OF THE DRAWINGS 

HGURE 1 is a schematic iUustration of an image compression system in accordance with 
the invention. 

FIGURE 2 is a flow chart for designing the compression system of FIG. 1 in accordance ^ 
witii the present invention. 

HGURE 3 is a schematic illustration of a decision tree for designing an embedded code 
25 for the system of FIG. 1. 

HGURE 4 is a grajph indicating tiie performance of the system of FIG. 1. 

HGURES 5-8 are graphs indicating tiie performance of other embodiments of the present 
invention. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 



20 



30 In accordance with the present invention, an image compression system Al comprises an 

encoder ENC. communications lines LAN, POTS, and IDSN, and a decoder DEC. as shown in 
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FIG. 1. Encoder ENC is designed to compress an original image for distribution over Ae 
communications lines. 

Communications lines POTS, IDSN, and LAN differ widely in bandwidth. "Plain Old 
Telephone Service" line POTS, which includes an associated modem, conveys data at a nominal 
5 rate of 28.8 kilobaud (symbols per second). "Integrated Data Services Network" line IDSN 
conveys data an order of magnitude faster. "Local Area Network" line LAN conveys data at about 
10 megabits per second. Many receiving and decoding computers are connected to each line, but 
only one computer is represented in FIG. 1 by decoder DEC. These computers decompress the 
transmission from encoder ENC and generate a reconstructed image that is faithful to the original 
10 image. 

Encoder ENC comprises a vectorizer VEC and a hierarchical lookup table HLT, as shown 
in no. 1 . Vectorizer VEC converts a digital image into a series of image vectors n. Hierarchical 
lookup table HLT converts the series of vectors B into three series of indices 2:Ai, ZBi. and ZCi. 
Index ZAi is a high-average-precision variable-length embedded code for transmission along line 
1 5 LAN, index ZBi is a moderate-average-precision variable-length embedded code for transmission 
along line IDSN, and index ZCi is a low-average-piecision variable-length embedded code for 
transmission along line POTS. The varying precision accommodates the varying bandwidths of 
the lines. 

Vectorizer VEC effectively divides an image into blocks Bi of 4x4 pixels, where i is a 
20 block index varying from 1 to the total number of blocks in the image. If the original image is not 
evenly divisible by the chosen block size, additional pixels can be added to sides of the image to 
make the division even in a manner known in the art of image analysis. Each block is represented 
as a 16-dimensional vector li = (Vij) wherej is a dimension index ranging from one to sixteen (1- 
G, septadecimal notation) in the order shown in FIG. 1 of the pixels in block Bi. Since only one 
25 block is Ulustrated in HG. 1 . the "f ' index is omitted from the vector values in FIG. 1 and below. 

Each vector element Vj is expressed in a suitable precision, e.g. , eight bits, representing a 
monochromatic (color or gray scale) intensity associated with the respective pixel. Vectorizer 
VEC presents vector elements Vj to hierarchical lookup table HLT in adjacenUy numbered odd- 
even pairs (eg., V 1 , V2) as shown in FIG. 1 . 

30 Hierarchical lookup table HLT includes four stages SI, S2, S3, and S4. Stages SI, S2, 

and S3 coUectively constitute a preliminary section PRE of hierarchical lookup table HLT, while 
fourth stage S4 constitutes a final section. Each stage SI, S2, S3. S4. includes a respective stage 
table Tl, T2, T3, T4. In FIG. 1, the tables of the preliminary section stages SI, S2, and S3 are 
shown multiple times to represent the number of times they are used per image vector. For 

35 example, table Tl receives eight pairs of image vector eleinents Vj and outputs eight respective 
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first-Stage indices Wj. If the processing power is affordable, a stage can include several tables of 
the same design so that the pairs of input values can be processed in parallel. 

The purpose of hierarchical lookup table is to map each image vector many-to-one to each 
of the embedded indices ZA, ZB, and ZC. Note that the total number of distinct image vectors is 
5 the number of distinct values a vector value Vj can assume, in this case 2^ = 256, raised to the 
number of dimensions, in this case sixteen. It is impractical to implement a table with 256^^ 
entries. The purpose of preliminary section PRE is to reduce the number of possible vectors that 
must be compressed with minimal loss of perceptually relevant information. The purpose of final- 
stage table T4 is to map the reduced number of vectors many-to-one to each set of embedded 
1 0 indices. Table T4 has 2^ entries corresponding to the concatenation of two ten-bit inputs. Tables 
T2, and T3 are the same size as table T4, while table Tl is smaller with 2'^ entries. Thus, the total 
number of addresses for all stages of hierarchical vector table HLT is less than four million, which 
is a practical number of table entries. For computers where diat is excessive, all tables can be v ' 
limited to 2^^ entries, so that the total number of table entries is about one million. 

1 5 Each preliminary stage table Tl, T2, T3, has two inputs and one output, while final stage 

T4 has two inputs and three outputs. Pairs of image vector elements Vj serve as inputs to first 
stage table Tl. The vector elements can represent values associated with respective pixels of an 
image block. However, the invention applies as well if the vector elements Vj represent an array 
of values obtained after a transformation on an image block. For example, the vector elements can 

20 be coefficients of a discrete cosine transform applied to an image block. 

On the other hand, it is computationally more efficient to embody a pre-computed 
transform in the hierarchical lookup table than to compute the transform for each block of each 
image being classified. Accordingly, in the present case, each input vector is in the pixel domain 
and hierarchical table HLT implements a discrete cosine transform. In other words, each vector 
25 value Vj is treated as representing a monochrome intensity value for a respective pixel of the 
associated image block, while indices Wj, Xj, Yj, ZA, ZB, and ZC, represent vectors in the 
spatial frequency domain. 

Each pair of vdctof values (Vj, V(j+1)) represents with a total of sixteen bits a 2x1 
(column X row) block of pixels. For example, (VI, V2) represents the 2x1 block highlighted in 
30 the leftmost replica of table Tl in FIG. 1. Table Tl maps pairs of vector element values many-to- 
one to eight-bit first-stage indices Wj; in this case, j ranges from 1 to 8. Each eight-bit Wj also 
represents a 2x1 -pixel block. However, the precision is reduced from sixteen bits to eight bits. 
For each image vector, there are sixteen vector values Vj and eight first-stage indices Wj. 



35 



The eight first-stage indices Wj are combined into four adjacent odd-even second-stage 
input pairs; each pair (Wj, W(j+1)) represents in sixteen-bit precision the 2x2 block constituted by 
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the two 2x1 blocks represented by the individual first-stage indices Wj. For example, (Wl ,W2) 
represents the 2x2 block highlighted in the leftmost replica of table T2 in FIG. 1. Second stage 
table T2 maps each second-stage input pair of first-stage indices many-to-one to a second stage 
index Xj. For each image input vector, the eight first-stage indices yield four second-stage indices 
XI, X2, X3, and X4. Each of the second stage indices Xj represents a 2x2 image block with 
eight-bit precision. 

The four second-stage indices Xj are combined into two third-stage input pairs (XI, X2) 
and (X3,X4), each representing a 4x2 image block with sixteen-bit precision. For exarhple, 
(XI, X2) presents the upper half block highlighted in the left repUca of table T3, while (X3,X4) 
represents the lower half block highUghted in the right replica of table T3 in FIG. 1 Third stage 
table T3 maps each third-stage input pair many-to-one to eight-bit third-stage indices Yl and Y2. 
These two indices Yl and Y2 are the output of preliminary section PRE in response to a single 
image vector. 

The two third-stage indices are paired to form a fourth-stage input pair (Y1,Y2) that 
expresses an entire image block with sixteen-bit precision. Fourth-stage table T4 maps fourth- 
stage input pairs many-to-one to each of the embedded indices ZA, ZB, and ZC. For an entire 
image, there arc many image vectors D, each yielding three respective output indices ZAi, ZBi, 
and ZCi. The specific relationship between inputs and outputs is shown in Table I below as well 
as in FIG. 1. 



wo 97/36376 



PCT/US97A)4879 



TABLE I: Lookup Table Mapping 


Lookup Table 


Inputs 


Output 


Tl 


VI, V2 


Wl 


«& 


V3, V4 


W2 


c« 


V5, V6 


W3 


44 


V7, V8 


W4 


44 


V9, VA 


W5 


44 


VB, VC 


W6 


44 


VD.VE 


W7 


«« 


VF, VG 


W8 


T2 


W1,W2 


XI 




W3, W4 


X2 


44 


W5,W6 


X3 




W7, W8 


X4 


T3 


XI, X2 


Yl 


44 


X3, X4 


Y2 


T4 


Y1,Y2 


ZA, ZB, ZC 



Decoder DEC is designed for decompressing an image received from encoder BNC over a 
LAN line. Decoder DEC includes a code pruner 51, a decode table 52, and an image 
assembler 53. Code pruner 51 performs on the receiving end the function that the multiple 
ouQ)uts from stage S4 perform on the transmitting end: allowing a tradeoff between fidelity and 
bit rate. Code pruner 51 embodies the criteria for praning index ZA to obtain indices 7R and 7C\ 
alternatively, code pruner 51 can pass index ZA unpruned. As explained further below, the code 
pruning effectively reverts to an earlier version of the greedily grown tree. In general, the pruned 
codes generated by a code pruner need not match those generated by the encoder. For example, 
the code pruner could proyide a larger set of alternatives. 

If a fixed length compression code is used instead of a variable-length code, the pruning 
function can merely involve dropping a fixed number of least-significant bits from the code. This 
truncation can take place at the encoder at the hierarchical table output and/or at the decoder. A 
more sophisticated approach is to prune selectively based on an entropy constraint. 

Decode table 52 is a lookup table that converts codes to reconstruction vectors. Since the 
code indices represent codebook vectors in a spatial frequency domain, decode table 52 
implements apre-computed inverse discrete cosine transform so that the reconstruction vectors are 



10 



0 



WO 97/36376 PCT/US97/04879 
in a pixel domain. Image assembler 53 converts the reconstruction vectors into blocks and 
assembles the reconstructed image from the blocks. 

Preferably, decoder DEC is implemented in software on a receiving computer. The 
software allows the fidelity versus bit rate tradeoff to be selected. The software then sets code 
5 pruner 5 1 according to the selected code precision. The software includes separate tables for each 
setting of code praner 51. On the table corresponding to the current setting of code pruner 51 is 
loaded into fast memory (RAM). Thus, lookup table 52 is smaller when pruning is activated. 
Thus, the pruning function allows fast memory to be conserved to match: 1) the capacity of the 
receiving computer; or 2) the allotment of local memory to the decoding function. 

10 A table design method Ml, flow charted in FIG. 2, is executed for each stage of 

hierarchical lookup table HLT, with some variations depending on whether the stage is the first 
stage S 1 , an intermediate stage S2, S3, or the final stage S4. For each stage, method Ml includes 
a codebook design procedure 10 and a table fill-in procedure 20. For each stage, fill-in procedure 
20 must be preceded by the respective codebook design procedure 10. However, there is no 
15 chronological order imposed between stages; for example, table T3 can be filled in before the 
codebook for table T2 is designed. 

For first-stage table Tl, codebook design procedure 10 begins with the selection of 
training images at step 11. The training images are selected to be representative of the type or 
types of images to be compressed by system Al . If system Al is used for general purpose image 
20 compression, the selection of training images can be quite diverse. If system Al is used for a 
specific type of image, e.g., line drawings or photos, then the training images can be a selection of 
images of that type. A less diverse set of training images allows more faithful image reproduction 
for images that are well matched to the training set, but less faithful image reproduction for images 
^1 that are not well matched to the training set. 

25 The training images are divided into 2x1 blocks, which are represented by two- 

dimensional vectors (Vj,V(J+l)) in a spatial pixel domain at step 12. For each of these vectors W] 
characterizes the intensity of the left pixel of the 2x1 block and V(J+1) characterizes the intensity 
of the right pixel of the ^2x1 block. 

In alternative embodiments of the invention, codebook design and table fill in arc 
30 conducted in the spatial pixel domain. For these pixel domain embodiments, steps 13, 23, 25 are 
not executed for any of the stages. A problem with the pixel domain is that the terms of the vector 
are of equal importance: there is no reason to favor the intensity of the left pixel over the intensity 
of the right pixel, and vice versa. For table Tl to reduce data while preserving as much 
information relevant to classification as possible, it is important to express the information so that 
35 more important information is expressed independently of less important information. 
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For the design of the preferred first-stage table Tl, a discrete cosine transform is applied at 
step 13 to convert the two-dimensional vectors in the pixel domain into two-dimensional vectors in 
a spatial frequency domain. The first value of this vector corresponds to the average intensities of 
the left and the right pixels, while the second value of the vector corresponds to the difference in 
intensities between the left and the right pixels. 

From the perspective of a human perceiver, expressing the 2x1 blocks of an image in a 
spatial frequency domain divides the information in the image into a relatively irtiportant term 
(average intensity) and a relatively unimportant term (difference in intensity). An image 
leconstracted on the basis of the average intensity alone would appear less distorted than an image 
reconstructed on the basis of the left or right pixels alone; either of the latter would yield an image 
which would appear less distorted tiiat an unage reconstmcted on the basis of intensity differences 
alone. For a given average precision, perceived distortion can be reduced by allotting more bits 
the more important dimensions and fewer to the less important dimension. 

The codebook is designed at step 14. The codebook indices are preferably fixed length, in 
this case ten bits. Maximal use of the fixed precision is attained by selecting the associated power 
of two as the number of codebook vectors. In the present case, the number of codebook vectors 
for table Tl is to be 210 = 1024. 

Ideally, step 14 would detemiine the set of 1024 vectors that would yield the minimum 
distortion for images having the expected probabUity distribution of 2x1 input vectors. While the 
problem of finding the ideal codebook vectors can be formulated, it cannot be solved generally by 
numerical methods. However, there is an iterative procedure that converges from an essentially 
arbitrary set of "seed" vectors toward a "good" set of codebook vectors. This procedure is known 
alternatively as the "cluster compression algorithm", the "Linde-Buzo-Gray" algorithm, and the 
"generalized Lloyd algorithm" (GLA). 

The procedure begins with a set of seed vectors. The training set of 2x1 spatial frequency 
vectors generated ftom the training images are assigned to the seed vectors on a proximity basis. 
This assignment defines clusters of training vectors around each of the seed vectors. The 
weighted mean vector for each cluster replaces the respective seed vector. The mean vectors 
provide better distortion performance than the seed vectors; a first distortion value is determined 
for these first mean vectors. 

Further improvement is achieved by re-clustering the training vectors around the 
previously determined mean vectors on a proximity basis, and then finding new mean vectors for 
the clusters. This process yields a second distortion value less than the ftfst distortion value. The 
difference between the first and second distortion values is the first distortion reduction value. 
The process can be iterated to achieve successive distortion values and distortion reduction values. 
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The distortion values and the distortion reduction values progressively diminish. In generally, the 
distortion reduction value does not reach zero. Instead, the iterations can be stopped with the 
distortion reduction values fall below a predetermined threshold— i.e., when further improvements 
in distortion are not worth the computational effort. 

5 One restriction of the GLA algorithm is that every seed vector should have at least one 

training vector assigned to it. To guarantee this condition is met, Linde, Buzo, and Gray 
developed a "splitting" technique for the GLA. See Y. Linde, A. Buzo, and R.M. Gray in "An 
algorithm for vector quantization Design", IEEE Transactions on Communications, COM-28:84— 
95, January, 1980, and An Introduction to Data Compression by Khalid Sayood, Morgan 
1 0 Kaufmann Publishers, Inc., San Francisco, California, 1996, pp. 222-228. 

This splitting technique begins by determining a mean for the set of training vectors. This 
can be considered the result of applying a single GLA iteration to a single arbitrary seed vector as 
though the codebook of interest were to have one vector. The mean vector is perturbed to yield a 
second "perturbed" vector. The mean and perturbed vectors serve as the two seed vectors for the 

1 5 next iteration of the splitting technique. The perturbation is selected to guarantee that some 
training vectors will be assigned to each of the two seed vectors. The GLA is then run on the two 
seed vectors until the distortion reduction value falls below threshold. Then each of the two 
resulting mean vectors are perturbed to yield four seed vectors for the next iteration of the splitting 
technique. The splitting technique is iterated untfl the desired number, in this case 1024, of 

20 codebook vectors is attained. 

If the reconstructed images are to be viewed by humans and a perceptual profile is 
available, the distortion and proximity measures used in step 14 can be perceptually weighted. For 
example, lower spatial frequency terms can be given more weight than higher spatial frequency 
terms. In addition, since this is vector rather than scalar quantization, interactive effects between 
25 the spatial frequency dimensions can be taken into account Unweighted measures can be used if 
the transform space is perceptually linear, if no perceptual profile is available, or the decompressed 
data is to subject to further numeric processing before the image is presented for human viewing. 

The codebook designed in step 14 comprises a set of 1024 2x1 codebook vectors in the 
spatial frequency domain. These are arbitrarily assigned respective ten-bit indices at step 15. This 
30 completes codebook design procedure 10 of method Ml for stage SI . 

FilUn procedure 20 for stage SI begins with step 21 of generating each distinct address to 
permit its contents to be determined. In the preferred embodiment, values are input into each of 
the tables in pairs. In alternative embodiments, some tables or all tables can have more inputs. 
For each table, the number of addresses is the product of the number of possible distinct values 
35 that can be received at each input. Typically, the number of possible distinct values is a power of 
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two. The inputs to table Tl receive an eight bit input VJ and eight-bit input V(J+1); the number of 
addresses for table Tl is thus 2»*2« = 2'* = 65,536. The steps following step 2 1 are designed to 
enter at each of these addresses one of the 2* = 256 table Tl indices Wj. 

Each input Vj is a scalar value corresponding to an intensity assigned to a respective pixel 
5 of an image. These inputs are concatenated at step 24 m pairs to define a two-dimensional vector 
(VJ, V(J+1)) in a spatial pixel domain. (Steps 22 and 23 are bypassed for the design of first-stage 
table Tl.) 

For a meaningful proximity measurement, the input vectors must be expressed in the same 
domain as the codebook vectors, i.e., a two-dimensional spatial frequency domain. Accordingly, 
10 a DCT is applied at step 25 to yield a two-dimensional vector in the spatial frequency domain of 
the table Tl codebook. 

The table Tl codebook vector closest to this input vector is determined at step 26. The 
proximity measure is unweighted mean square error. Better performance is achieved using an 
objective measure like unweighted nnean square error as the proximity measure during table 
15 building rather than a peiceptuaUy weighted measure. On the other hand, an unweighted 
proximity measurement is not required in general for this step. Preferably, however, the 
measurement using during table fill at step 26 is weighted less on the average than the measures 
used in step 14 for codebook design. 

At step 27, the index Wj assigned to the closest codebook vector at step 16 is then entered 
20 as the contents at the address corresponding to the input pair (Vj, VO+l))- During operation of 
system Tl, it is this index that is output by table Tl in response to the given pair of input values. 
Once indexes Wj are assigned to all 65,536 addresses of table Tl, method Ml design of table Tl 
is complete. 

For second-stage table T2, the codebook design begins with step 11 of selecting training 
25 images, just as for first-stage table Tl. The training images used for design of the table Tl 
codebook can be used also for the design of the second stage codebook. At step 1 2, the training 
images are divided into^2x2 pixel blocks; the 2x2 pixel blocks are expressed as image vectors in 
four-dimensional vector space in a pixel domain; in other words, each of four vector values 
characterizes the intensity associated with a respective one of the four pixels of the 2x2 pixel 
30 block. 

At step 13, the four-dimensional vectors are converted using a DCT to a spatial frequency 
domain. Just as a four-dimensional pixel-domain vector can be expressed as a 2x2 array of 
pixels, a four-dimensional spatial frequency domain vector can be expressed as a 2x2 array of 
spatial frequency functions: 
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FOO 


FOl 


FIO 


Fll 



The four values of the spatial frequency domain respectively represent: FOO)— an average 
intensity for the 2x2 pixel block; FOl)— an intensity difference between the left and right halves of 
the block; F10)~an intensity difference between the top and bottom halves of the block; and 
Fl l)-a diagonal intensity difference. The DCT conversion is lossless (except for small rounding 
errors) in that the spatial pixel domain can be retrieved by applying an inverse DCT to the spatial 
frequency domain vector. 

The four-dimensional frequency-domain vectors serve as the training sequence for second 
stage codebook design by the LBG/GLA algorithm. The proximity and distortion measures can 
be the same as those used for design of the codebook for table Tl. The difference is that for table 
1 0 T2, the measurements arc performed in a four-dimensional space instead of a two-dimensional 
space. Eight-bit indices Xj are assigned to the codebook vectors at step 15, completing codebook 
design procedure 10 of method Ml. 

Fill-in procedure 20 for table T2 involves entering indices Xj as the contents of each of the 
table T2 addresses. As shown in FIG. 1, the inputs to table T2 arc to be ten-bit indices Wj from 
15 the outputs of table Tl. These arc received in pairs so that there are 2*^*2'^ = 2^° = 1,048,576 
addresses for table T2. Each of these must be filled with a respective one of 2"® = 1024 ten-bit 
table T2 indices Xj. 

Looking ahead to step 26, the address entries arc to be determined using a proximity 
measure in the space in which the table T2 codebook is defined. The table T2 codebook is defined 
20 in a four-dimensional spatial frequency domain space. However, the address inputs to table T2 
are pairs of indices (Wj,W(J+l)) for which no meaningful metric can be applied. Each of these 
indices corresponds to a table Tl codebook vector. Decoding indices (Wj,W(J+l)) at step 22 
yields the respective table Tl codebook vectors, which are defined in a metric space. 

However, the table Tl codebook vectors are defined in a two-dimensional space, whereas 
25 four-dimensional vectors are required by step 26 for stage S2. While two two-dimensional 
vectors frequency domain can be concatenated to yield a four-dimensional vector, the result is not 
meaningful in the present context: the result would have two values corresponding to average 
intensities, and two values corresponding to left-right difference intensities; as indicated above, 
what would be required is a single average intensity value, a single left-right difference value, a 
30 single top-bottom difference value, and a single diagonal difference value. 
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Since there is no direct, meaningful method of combining two spatial frequency domain 
vectors to yield a higher dimensional spatial frequency domain vector, an inverse DCT is applied 
at step 23 to each of the pair of two^imensional table Tl codebook vectors yielded at step 22. 
The inverse DCT yields a pair of two-dimensional pixel-domain vectors that can be meaningfully 
concatenated to yield a four-dimensional vector in the spatial pixel domain representing a 2x2 pixel 
block. A DCT transform can be applied, at step 25, to this four-dimensional pixel domain vector 
to yield a four-dimensional spatial frequency domain vector. This four^imensional spatial 
frequency domain vector is in the same space as tiie table T2 codebook vectors. Accordingly, a 
proximity measure can be meaningfully applied at step 26 to determine the closest table T2 
codebook vector. 

The index Xj assigned at step 1 5 to the closest table T2 codebook vector is assigned at step 
27 to the address under consideration. When mdices Xj are assigned to all table T2 addresses, 
table design method Ml for table T2 is complete. 

Table design method Ml for intermediate stage S3 is similar to that for intermediate stage 
82 except ttiat tiie dimensionaUty is doubled. Codebook design procedure 20 can begin with the 
selection of the same or similar training images at step 1 1 . At step 12. the images are converted to 
eight-dimensional pixeWomain vectors, each representing a 4x2 pixel block of a traimng image. 

A DCT is applied at step 13 to tiie eight-dimensional pixel-domain vector to yield an eight- 



1 FOO 


FOl 


F02 


F03 
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Fll 


F12 


F13 



20 Altiiough basis functions FOO, FOl, FIO, and Fl 1 have roughly, tiie same meanings as 

tiiey do for a 2x2 array, once the array size exceeds 2x2, it is no longer adequate to describe the 
basis functions in terms of differences alone. Instead, the terms express different spatial 
frequencies. The functions. FOO, FOl. F02, F03, in the first row represent increasingly greater 
horizontal spatial frequencies. The functions FOO. FOl , in the first column represent increasingly 

25 greater vertical spatial frequencies. The remaining functions can be characterized as representing 
two^mensional spatial frequencies that are products of horizontal and vertical spatial frequencies, 

Human perceivers arc relatively insensitive to higher spatial frequencies. Accordingly, a 
perceptual proximity measure might assign a relatively low Oess tiian unity) weight to high spatial 
frequency terms such as F03 and F04. By the same reasoning, a relatively high (greater than 
30 unity) weight can be assigned to low spatial firequency terms. 
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The perceptual weighting is used in the proximity and distortion measures during 
codebook assignment in step 14. Again, the splitting variation of the GLA is used. Once the 256 
word codebook is determined, indices Yj are assigned at step 15 to the codebook vectors. 

Table fill-in procedure 20 for table T3 is similar to that for table T2. Each address 
generated at step 21 corresponds to a pair (XJ, X(J+1)) of indices. These are decoded at step 22 
to yield a pair of four-dimensional table T2 spatial-frequency domain codebook vectors at step 22. 
An inverse DCT is applied to these two vectors to yield a pair of four-dimensional pixel-domain 
vectors at step 23. The pixel domain vectors represent 2x2 pixel blocks which are concatenated at 
step 24 so that tiie resulting eight-dimensional vector in the pixel domain corresponds to a 4x2 
pixel block. At step 25, a DCT is applied to the eight-dimensional pixel domain vector to yield an 
eight-dimensional spatial frequency domain vector in the same space as the table T3 codebook 
vectors. 

The closest table T3 codebook vector is determined at step 26, preferably using an 
unweighted proximity measure such as mean-square error. The table T3 index Yj assigned at step 
15 to the closest table T3 codebook vector is entered at the address under consideration at step 27. 
Once con^ponding entries are made for all table T3 addresses, design of table T3 is complete. 

Table design method Ml for final-stage table T4 can begin with the same or a similar set of 
training images at step 11. The training images are expressed, at step 12, as a sequence of 
sixteen-dimensional pixel-domain vectors representing 4x4 pixel blocks (having the form of Bi in 
FIG. 1). A DCT is applied at step 13 to the pixel domain vectors to yield respective sixteen- 
dimensional spatial frequency domain vectors, the statistical profile of which is used to build the 
final-stage table T4 codebook. 

Instead of building a standard table-based VQ codebook step as for stage SI, S2, and S 3 , 
step 16 builds a tree-structured codebook. The main difference between tree-structured codebook 
design and the full-search codebook design used for the preliminary stages is that most of the 
codebook vectors are determined using only a respective subset of the training vectors. 

As in the splitting variation, the mean, indicated at A in FIG. 3, of the training vectors is 
determined. For stage' S4, the training vectors are in a sixteen-dimensional spatial frequency 
domain. The mean is perturbed to yield seed vectors for a two- vector codebook. The GLA is run 
to determine the codebook vectors for the two-vector codebook. 

In a departure from the design of the preliminary section codebooks, the clustering of 
training vectors to the two-vector-codebook vectors is treated as permanent. Indices 0 and 1 arc 
assigned respectively to the two-vector-codebook vectors, as shown in FIG. 3. Each of the two- 
vector-codebook vectors are perturbed to yield two pairs of seed vectors. For each pair, the GLA 
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is ran using only the training vectors assigned to its parent codebook vector. The result is a pair 
of child vectors for each of the original two-vector-codebook vectors. The child vectors are 
assigned indices having as a prefix the index of the parent vector and a one bit suffice. The child 
vectorsof the codebook vector assigned index 0 vector are assigned indices 00 and 01, while the 
5 child vectors of 1 codebook vector are assigned indices 10 and 1 1 . Once again, the assignment of 
training vectors to the four child vectors is treated as permanent. 

There are "evenly-growing" and "greedily-growing" variations of decision-tree growth. In 
either case, it is desirable to overgrow the tree and then prune back to a tree of the desired 
precision. In the evenly-growing variation, both sets of children are retained as used in selecting 
10 seeds for the next generation. Thus, the tree is grown generation-by-generation. Growing an 
evenly-grown tree to the maximum possible depth of the desired variable-length code can consume 
more nnemory and computation time than is practical. 

Less growing and less pruning are required if the starting point for the pruning has the 
same general shape as the tree that results from the praning. Such a tree can be obtained by the 

1 5 preferred "greedily-growing" variation, in which growth is node-by-node. In general, the growth 
is uneven, e.g., one sibling can have grandchildren before the other sibling has children. Tbe 
determination of which childless node is the next to be grown involves computing a joint measure 
D + XH for the increase in distortion D and in entropy H that would result from a growth at each 
childless node. Growth is promoted only at the node with the lowest joint measure. Note that the 

20 joint measure is only used to select the node to be grown; in the preferred embodiment, entropy is 
not taken into account in the proximity measure used for clustering. However, the invention 
provides for an entropy-constrained proximity measure. 

In the example, joint entropy and distortion measures are determined for two three-vector 
codebooks. each including an aunt and two nieces. One three-vector codebook includes vectors 0, 

25 10, and 11; the other three-vector codebook includes vectors 1, 00, and 01. The three- vector 
codebook with the lower joint measure supersedes the two-vector codebook. Thus, the table T4 
codebook is grown one vector at a time (instead of doubling each iteration as with the splitting 
procedure.) In addition, the parent that was replaced by her children is assigned an ordinal. In the 
example of FIG. 3, the lower distortion is associated with the children of vector 1. The three 

30 vector codebook consists of vectors 1 1, 10, and 0. The ordinal 1 (in parenthesis in HG. 3) is 
assigned to the replaced parent vector 1. This ordinal is used in selecting compression scaling. 

In the next iteration of the tree-growing procedure, the two new codebook vectors, e.g., 
1 1 and 10, are each perturbed so that two more pairs of seed vectors are generated. The GLA is 
run on each pair using only training vectors assigned to the respective parent, the result is two 
35 paire of proposed new codebook vectors (111. 110) and (101,100). Distortion measures are 
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obtained for each pair. These distortions measures are compared with the already obtained 
distortion measure for the vector, e.g., 0, common to the two- vector and three- vector codebooks. 
The tree is grown from the codebook vector for which the growth yields the least distortion. In 
the example of FIG. 3, the tree is grown from vector 0, which is assigned the ordinal 2. 

5 With each iteration of the growing technique, one parent vector is replaced by two child 

vectors, so that the next level codebook has one more vector that the preceding level codebook. 
Indices for the child vectors are formed by appending 0 and 1 respectively to the end of the index 
for the parent vector. As a result, the indices for each generation are one longer than the indices 
for the preceding generation. The code thus generated is a "prefix" code. FIG. 3 shows a tree 
1 0 after nine iterations of the tree-growing procedure. 

Optionally, tree growth can terminate with a tree with the desired number, of end nodes 
corresponding to codebook vectors is achieved. However, the resulting tree is typically not 
optimal. To obtain a more optinial tree, growth continues well past the size required for the 
desired codebook. For example, the average bit length for codes associated with the overgrown 
1 5 three can be twice the average bit length desired for the tree to be used for the maximum precision 
code. The overgrown tree can be pruned node-by-node using a joint measure of distortion and 
entropy until a tree of the desired size is achieved. Note that the pmning can also be used to obtain 
an entropy shaped tree from an evenly overgrown tree. 

Lower precision trees can be designed by the ordinals assigned during greedy growing. 

20 There may be some g^s in the numbering sequence, but a numerical order is still present to guide 
selection of nodes for the lower-precision trees. Preferably, however, the high-precision tree is 
pruned using the joint measure of distortion and entropy to provide better low-precision trees. To 
the extent of the pruning, ordinals can be reassigned to reflect pruning order rather than the 
growing order. If the pruning is continued to the common ancestor and its children, then all 

25 ordinals can be reassigned according to pruning order. 

The fiill-precision-tree codebook provides lower distortion and a lower bit rate than any of 
its predecessor codebooks. If a higher bit rate is desired, one can select a suitable ordinal and 
prune all codebook vectors with higher ordinals. The resulting predecessor codebook provides a 
near optimal tradeoff of distortion and bit rate. In the present case, a 1024- vector codebook is 
30 built, and its indices are used for index ZA. For index ZB, the tree is pruned back to ordinal 5 1 2 
to yield a higher bit rate. For ZC, the index is pruned back to ordinal 256 to yield an even higher 
bit rate. Note that the code pruner 51 of decoder DEC has information regarding the ordinals to 
allow it to make appropriate bit-rate versus distortion tradeoffs. 

While indices ZA, ZB, and ZC could be entered in sections of respective addresses of table 
35 T4, doing so would not be memory efficient. Instead ZC, Zb, and Za are stored. Zb indicates the 
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bits to be added to index ZC to obtain index ZB. Za indicates the bits to be added to index ZB to 

obtain index ZA. 

Fill-in procedure 20 for table T4 begins at step 21 with the generation of the 2^ addresses 
corresponding to aU possible distinct pairs of inputs (Y1,Y2). Each third stage index Yj is 
decoded at step 22 to yield the respective eight-dimensional spatial-frequency domain table T3 
codebook vector. An inverse DCT is applied at step 23 to these table T3 codebook vectors to 
obtain the corresponding eight-dimensional pixel domain vectors representing 4x2 pixel blocks. 
These vectors are concatenated at step 24 to form a sixteen-dimensional pixel-domain vector 
corresponding to a respective 4x4 pixel block. A DCT is appUed at step 24 to yield a respective 
► sixteen-dimensional spatial frequency domain vector in the same space as the table T4 codebook. 

The closest table T4 codebook vector in each of the three sets of codebook vectors are 
identified at step 26, using an unweighted proximity measure. The class indices ZA, ZB, and AC 
associated with the closest codebook vectors are assigned to the table T4 address under 
consideration. Once this assignment is iterated for all table T4 addresses, design of table T4 is 
) complete. Once all tables T1-T4 are complete, design of hierarchical table HLT is complete. 

The performance of the resulting compression system is indicated in HG. 4 for the 
variable-rate tree-stractuied hierarchical table-based vector quantization (VRTSHVQ) compression 
case of tihe preferred embodiment It is noted that the compression effectiveness is slightly worse 
than for non-hierarchical vaiiable-rate tree-structured table-based vector quantization (VRTSVQ) 
10 compression. However, it is significantly better than plain hierarchical vector quantization 
(HVQ). 

More detailed descriptions of the methods for incorporating perceptual measures, a tree- 
structure, and entropy constraints in a hierarchical VQ lookup table are presented below. To 
accommodate the increased sophistication of the description, some change in notation is required. 
25 Tlie examples below employ perceptual measures during table fill in; in accordance with the 
present invention, it is maintained that lower distortion is achievable using unweighted measures 
for table fill in. 

The tables used to implement vector quantization can also implement block transforms. In 
these table lookup encoders, input vectors to the encoders are used directiy as addresses in code 

30 tables to choose the codewords. There is no need to perform the forward or reverse transforms. 
They are implemented in the tables. Hierarchical tables can be used to preserve manageable table 
sizes for large dimension VQ's to quantize a vector in stages. Since both the encoder and decoder 
are implemented by table lookups, there are no arithmetic computations required in the final 
system implementation. The algorithms are a novel combination of any generic block transform 

35 (DCT, Haar, WHT) and hierarchical vector quantization. They use perceptual weighting and 
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subjective distortion measures in the design of VQ's. They are unique in that both the encoder ^d 
the decoder are implemented with only table lookups and are amenable to efficient software and 
hardware solutions. 

FuU-search vector quantization (VQ) is computationally asymmetric in that the decoder can 
5 be implemented as a simple table lookup, while the encoder must usually be implemented as an 
exhaustive search for the minimum distortion codeword. VQ therefore finds application to 
problems where the decoder must be extremely simple, but the encoder may be relatively complex, 
e.g., software decoding of video from a CDROM. 

Various structured vector quantizers have been introduced to reduce the complexity of a 
10 full-search encoder. For example, a transform code is a structured vector quantizer in which the 
encoder performs a linear transformation followed by scalar quantization of die transfonn 
coefficients. This structure also increases the decoder complexity, however, since tiie decoder 
must now perform an inverse transform. Thus in transform coding, the computational 
complexities of the encoder and decoder are essentially balanced, and hence transform coding 
15 finds natural application to point-to-point conmiunication, such as video telephony. A special 
advantage of transform coding is that perceptual weighting, according to frequency sensitivity, is 
simple to perform by allocating bits appropriately among tiansform coefficients. 

A number of other sttuctured vector quantization schemes decrease encoder complexity but 
do not simultaneously increase decoder complexity. Such schemes include tree-strucmred VQ, 

20 lattice VQ, fine-to-coarse VQ, etc. Hierarchical table-based vector quantization (HTBVQ) replaces 
the full-search encoder witii a hierarchical arrangement of table lookups, resulting in a maxunum 
of one table lookup per sample to encode. The result is a balanced scheme, but witti extremely 
low computational complexity at both the encoder and decoder. Furtiiermore, the hierarchical 
arrangement allows efficient encoding for multiple rates. Thus HVQ finds natural appUcation to 

25 collaborative video over heterogeneous networks of inexpensive general purpose computers. 

Perceptually significant distortion measures can be integrated into HTBVQ based on 
weighting the coefficients of aibitiary transforms. Essentially, die transforms are pre-computed 
and built into the encoder and decoder lookup tables. Thus gained arc the perceptual advantages 
of te-ansform coding while maintaining the computational sinq)licity of table lookup encoding and 
30 decoding. 

HTBVQ is a metiiod of encoding vectors using only table lookups. A stiraightforward 
metiiod of encoding using table lookups is to address a table directiy by the symbols in the input 
vector. For example, suppose each input symbol is pre-quantized to ro = 8 bits of precision (as is 
typical for tiie pixels in a monochrome image), and suppose the vector dimension is K = 2. Then 
35 a lookup table with Kt^ = 16 address bits and log^ N output bits (where N is the number of 
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codewords in the codebook) could be used to encode each two-dimensional vector into the index 
of its nearest codeword using a single table lookup. Unfortunately, the table size in this 
straightforward method gets infeasibly large for even moderate K. For image coding, we may 
want K to be as large as 64, so that we have the possibility of coding each 8x8 block of pixels as a 
5 single vector. 

By performing the table lookups in a hierarchy, larger vectors can be accommodated in a 
practical way, as shown in FIG. 1. In the figure, a K = 8 dimensional vector at original 
precision r^ = 8 bits per symbol is encoded into r^ = 8 bits per vector {i.e„ at rate R = r^^/K = 1 bit 
per symbol for a compression ratio of 8: 1) using M = 3 stages of table lookups. In the first stage, 
10 the K input symbols are partitioned into blocks of size k^ = 2, and each of these blocks is used to 
directly address a lookup table with k^^o = 16 address bits to produce rj = 8 output bits. 

Likewise, in each successive stage m from 1 to M, the i\„-l-bit outputs from the previous 
stage are combined into blocks of length k^ to directly address a lookup table with k„r^, address 
bits to produce r^ output bits per block. The r^ bits output from the final stage M may be sent 
1 5 directly through the channel to the decoder, if the quantizer is a fixed-rate quantizer, or the bits 
may be used to index a table of variable-length codes, for example, if the quantizer is a variable- 
rate quantizer. In the fixed-rate case, r^ determines the overall bit rate of the quantizer, R = r,^ 

bit per symbol, where K = K^^ = Tlkm is the overall dimension of the quantizer. Indeed, at each 

m 

Stage m, r„ determines the bit rate of a fixed-rate quantizer with dimension K„ = ik« . Hence if 

20 k^ = 2 and r^ =8 for all m, then after each stage in the hierarchy, the vector dimension K„ doubles 
and the bit rate rJK^ halves, i.e., the compression ratio doubles. Note that the resulting sequence 
of fixed-rate quantizers can be used for multi-rate coding. 

The computational complexity of the encoder is at most one table lookup per input symbol, 
1 1 

since there are at most < — table lookups per input symbol in the mth stage, and ^ 2"" ^ 1 . 

Km 2 m=I 

25 The storage requirements of the encoder are 2^"''''"'^ x rm bits for a table in the mth stage. If 

k„ = 2 and r„ = 8 for all m, then each table is a 64 Kbyte table, so that assuming all the tables 
within a stage are identical, only one 64 Kbyte table is required for each of the M = logj K stages 
of the hierarchy. Clearly many possible values for k„ and r„ are possible, but k^ = 2 and r^ = 8 
are usually most convenient for the purposes of implementation. The following description can 

30 be extrapolated to cover the other values. 

The main issue to address at this point is the design of the tables' contents. The table at 
stage can be regarded as a mapping fi"om two input indices i^"^ and each in 
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(0,1, ,255}, to an output index i" also in {0,1, ,255}. With respect to a distortion 

measure d Jx, x) between vectors of dimension K„ = 2"*, design a fixed-rate VQ codebook p„(i), 
1 = 0,1,...,255 with dimension K„ = 2"» and rate rJK^^ 8/2™ bits per symbol, trained on the 
original data using any convenient VQ design algorithm (such as the generalized Lloyd algorithm). 
Then set rQT'^ar^) = aTgnnnidrni(Prn^i(if^^^^ to be the index of the 

2"'-dimensional codeword closest to the 2'"-dimensional vector constructed by concatenating the 
2""' -dimensional codewords p(£f and P(i2 "^)- The intuition behind this construction is that if 
is a good representative of the first half of the 2"*-dimensional input vector, and 
PiD-i( i?"^) is a good representative of the second half, then P„(i"), with i™ defined above, will be a 
good representative of both halves, in the codebook ^Ji). i=0,l,....,255. 

An advantage of HTBVQ is that complexity of the encoder does not depend on the 
complexity of the distortion measure, since the distortion measure is pre-computed into the tables. 
Hence HTBVQ is ideally suited to implementing peiceptuaUy meaningful, if complex, distortion 
measures. 

Let d*(x, 5) be an arbitrary non-negative distortion measure on SR^^xSR^ such that for each 
X, d'(x,x) as a function of x is zero at i=x and is twice continuously dififerentiable in x at x. 
Then d'(x, ic) as a function of Jc has a Taylor series expansion around x in which the constant and 
first order terms aie zero, and the quadratic term is non-negative semi-definite. Hence the 
distortion measure may be approximated by the input-weighted squared error d(x,i) = (x- 
xyM^{x-x) where x' denotes the transpose of x and is the matrix of second derivatives of 
d'(x,x) as a function of x at x divided by 2. Since is synmietric and non-negative semi- 
definite, it may be diagonalized to a matrix of its non-negative eigenvalues, say 

K 

d{x,x) = {Tx-TxyWx(Tx-'Tx)^J^Wj(wjiyj-yjr=dr(y>y^ where 

= (Wp w^) and K is the dimension of x. 

If the diagonalizing matrix T,^ (of normalized eigenvectors of W,^) does not depend on x, 

then 

K 

d(x,x) = (rx-rx)* WATx'Tx)^'^wjiwj(yj-yjf =^driy>y)' 

where y^ and yj are the components of y = Tx and 5> = Tic, respectively. That is, the distortion 

is the weighted sum of squared differences between the transform coefficients y and y . We shall 
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henceforth assume that T is the transformation matrix of some fixed transform, such as the Haar, 
Walsh-Hadamard, or discrete cosine transform, and we shall let the weights vary arbitrarily 
with X. This is a reasonably general class of perceptual distortion measures. 

When there is no weighting, i.e., when W^ = I, then d(x,ic) = IITx-T jell = x-iclP 
regardless of the orthogonal transformation T. This is because the rows (and columns) of T are 
orthonormal, and therefore T is a distance-preserving rotation and/or reflection. Hence when the 
weighting is uniform, the squared error in the transformed space equals the squared error in the 
original space, regardless of whether the transform is the Haar transform (HT), Walsh-Hadamard 
transform (WHT), discrete cosine transform (DCT), etc. Indeed, full-search VQ codebooks 
designed in transform space to minimize the mean squared error for different transforms T are all 
equivalent, since their codewords are simple rotations and/or reflections of each other. The energy 
compaction criterion so cracial to determming the best transform for scalar quantization of the 
coefficients is irrelevant for determining the best transform for vector quantization of the 
coefficients, when the weights are uniform. 

When the weights are not uniform, different orthogonal transformations result in different 
distortion measures. Thus nonuniform weights play an essential role in this class of perceptual 
distortion measures. 

The weights reflect human visual sensitivity to quantization errors in different transform 
coefficients, or bands. The weights may be input-dependent to model masicing effects. When 
used in the perceptual distortion measure for vector quantization, the weights control an effective 
stepsize, or bit allocation, for each band. Consider uniform scalar quantization of the transform 
coefficients, as in JPEG, for example. By setting the stepsizes s,, s^ of the scalar quantizers 
for each of the K bands, bits are allocated between bands in accordance with the strength of the 
signal in the band and an appropriate perceptual model. The encoding regions of the resulting t| 
product code are hyper-rectangles with side Sj along the jth axis, j = 1,....,K. 

When the transform coefficients are vector quantized with respect to a weighted squared 
error distortion measure, the weights w,, ,Wk play a role corresponding to the stepsizes. The 

f 2 

weighted distortion measure (in the transform domain) dr(y , ^) equals ^ ^wj^ Vj^ wY yj || • 

which is the ordinary (unweighted) squared error of a transform whose K coefficients have been 

scaled by the factors j = U >K. In this scaled transform space, the vector quantizer 

with the minimum mean squared error subject to an entropy constraint has a uniform codeword 
density (at least for large numbers of codewords), so that each encoding cell has the same 
volume V in K-space. Hence each encoding cell has linear dimension V*^ (times a sphere packing 
coefficient less than 1) in the scaled space. In the unsealed space, each encoding cell has roughly 
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linear dimension -m;^^ along the jth coordinate. Thus the square roots of the weights w., j = 

1 ,K, correspond to the inverse of the scale factors , j = 1 ,K, or wj Sj^ One way to 

derive a perceptual distortion measure is to use the DCT for the transformation matrix and the 
squared inverse of the JPEG stepsizes for the weights. 

HTBVQ can be combined with block based transforms like the DCT, the Haar and the 
Walsh-Hadamard Transform, perceptually weighted to improve visual performance. Herein the 
combination is referred to as Weighted Transform HVQ (WTHVQ). Here, we apply WTHVQ to 
image coding. 

The encoder of a WTHVQ consists of M stages (as in FIG. 1), each stage being 
implemented by a lookup table. For image coding, separable transforms are employed, so the odd 
stages operate on the rows while the even stages operate on the columns of the image. The first 
stage combines k, = 2 horizontally adjacent pixels of the input image as an address to the first 
lookup table. This first stage corresponds to a 2x1 transform on the input image followed by 
percq}tually weighted vector quantization using a subjective distortion measure, with 256 
codewords. Thus the rate is halved at each stage of the WTHVQ. The first stage gives a 
compression of 2:1. 

The second stage combines kj = 2 outputs of the first stage that are vertically adjacent as 
an address to the second stage lookup table. The second stage corresponds to a 2x2 transform on 
the input image followed by perceptually weighted vector quantization using a subjective distortion 
measure, with 256 codewords. The only difference is that the 2x2 vector is quantized 
successively in two stages. The compression achieved after the second stage is 4:1. 

In stage i, 1 < i ^ M, the address for the table is constructed by using kj = 2 adjacent 
outputs of the previous stage and the addressed content is directly used as the address for the 
next stage. Stage i corresponds to a perceptually weighted transform, for i even, or a 

2Ci+iy2^2^*''^'^ transform, for i odd, followed by a perceptually weighted vector quantizer using a 
subjective distortion measure with 256 codewords. The only difference is that the quantization is 
performed successively in i stages. The compression achieved after stage i is 2*:1. Thus the 

M 

overall vector dimension is K = fe^ . The overall compression ratio after the M stages is 2**: 1 . 

i=l 

The last stage produces the encoding index w, which represents an approximation to the input 
(perceptually weighted transform) vector and sends it to the decoder. This encoding index is 
similar to that obtained in a direct transform VQ with an input weighted distortion measure. The 
decoder of a WTHVQ is the same as a decoder of such a transform VQ. That is, it is a lookup 
table in which the reverse transform is done ahead of time on the codewords. 
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The computational and storage requirements of WTHVQ are same as that of ordinary 
HVQ. In principle, the design algorithm for WTHVQ is the same as that of ordinary HVQ, but 
using a perceptual distortion measure. In practice, however, computation savings result by 
transforming the data and designing the WTHVQ in the transformed space, using orthogonally 
5 weighted distortion measure dp. 

The design of a WTHVQ consists of two major steps. The first step designs VQ 
codebooks for each transform stage. Since each perceptually weighted transform VQ stage has a 
different dimension and rate they are designed separately. A subjectively meaningful distortion 
measure as described above is used for designing the codebooks. 

10 The codebooks for each stage of the WTHVQ are designed independendy by the 

generalized Lloyd algorithm (GLA) mn on the transform of the appropriate order on the training 
sequence. The first stage codebook with 256 codewords is designed by mnning GLA on a 2x1 
transform (DCT, Haar, or WHT) of the training sequence. Similarly the stage i codebook (256 
codewords) is designed using the GLA on a transform of the training sequence of the appropriate 

15 order for that stage. The reconstructed codewords for the transformed data using the subjective 
distortion measure ^ are given by: 

y = arg min^ E[dd (Y, y)] = iE[W^]r^ElW^Y] 

The original training sequence is used to design all stages by transforming it using the 
corresponding transforms of the appropriate order for each stage. In reality the corresponding 
20 input training sequence to each stage are generally different because each stage has to go through a 
lot of previous stages and the sequence is quantized successively in each stage and is hence 
different at each stage. 

The second step in the design of WTHVQ builds lookup tables from the designed 
codebooks. After having built each codebook for the transform the corresponding code tables are 

25 built for each stage. The first stage table is built by taking different combinations of two 8-bit 
input pixels. There are 2** such combinations. For each combination a 2x1 transform is 
performed. The index of the codeword closest to the transform for the combination in the sense of 
minimum distortion rale (subjective distortion measure df) is put in the output entry of the table 
for that particular input combination. This procedure is repeated for all possible input 

30 combinations. Each output entry (2^^ total entries) of the first stage table has 8 bits. 

The second stage table operates on the colunms. Thus for the second stage the product 
combination of two first stage tables is taken by taking the product of two 8-bit outputs from the 
first stage table. There are 2*^ such entries for the second stage table. For a particular entry a 
successively quantized 2x2 transform is obtained by doing a 2x1 inverse transform on the two 
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codewords obtained by using the indices for the first stage codebook. Now on the 2x2 raw data 
obtained a 2x2 transform is performed and the index of the codeword closest to this transformed 
vector in the sense of the subjective distortion measure ^ is put in the corresponding output entry. 
This procedure is repeated for all input entries in the table. Each output entry for the second 
stage table also has 8 bits. 

The third stage table operates on the rows. Thus for the third stage the product 
combination of two second stage tables is obtained by taking the product of the output entries of 
the second stage tables. Each output entry of the second stage table has 8 bits. Thus the total 
number of different input entries to the third stage table are 2'^. For a particular entry a 
successively quantized 4x2 transform is obtained by doing a 2x2 inverse transform on the two 
codewords obtained by using the indices for the second stage codebook. Now on the 4x2 raw 
data obtained a 4x2 transform is performed and the index of the codeword closest in the sense of 
the subjective distortion measure d^ to this transformed vector is put in the corresponding output 
entry. 

All remaining stage tables are built in a similar fashion by performing two inverse 
transforms and then performing a forward transform on the data. The nearest codeword to this 
transform data in the sense of subjective distortion measure dj. is obtained from the codebook for 
that stage and the corresponding index is put in the table. The last stage table has the index of the 
codeword as its output entry which is sent to the decoder. The decoder has a copy of the last 
staige codebook and uses the index for the last stage to output the corresponding codeword. 

A simpler table building procedure can be used for the Haar and the Walsh-Hadamard 
transforms. This happens because of the nice property of the Haar and WHT that higher order 
transform can be obtained as a linear combination of a lower order transform on the partitioned 
data. The table building for the DCT, /.e. tfie inverse transform method, will be more expensive 
than the Haar and the WHT because at each stage two inverse transforms and one forward DCT 
transform must be performed. 

Simulation results have been obtained for the for the different HVQ algorithms. The 
algorithms are compared against JPEG and full search VQ. Table II gives the PSNR results on 
the 8-bit monochrome image Lena (512x512) for different compression ratios for JPEG, full- 
search plain VQ, full-search unweighted Haar VQ, full-search unweighted WHT VQ and full- 
search unweighted DCT VQ. The codebooks for the VQ have been generated by training on five 
different images (Womanl, Woman2, Man, Couple and Crowd). 

It can be seen from Table n that the PSNR results of plain VQ and unweighted transform 
VQ are the same at each compression ratio. This is because the transforms are all orthogonal, any 
differences are due to the fact that the splitting algorithm in the GLA is sensitive to the coordinate 

27 



wo 97/36376 PCTAJS97/04879 

system. JPEG performs around 5 dB better than these schemes since it is a variable rate code. 
These VQ based algorithms being fixed rate have other advantages compared to JPEG. However 
by using entropy coding along with these algorithms 25% more compression can be achieved. 



Table H: PSNR results 




Compression 
Ratio 


JPEG 


Plain VQ 


HaarVQ 


WHTVQ 


DCrVQ 


2:1 


46.9 


41.7 


41.7 


41.7 


41.7 


4:1 


40.8 


35.9 


35.8 


35.8 


35.8 


8:1 


37.7 


32.5 


32.5 


32.5 


32.5 


16:1 


34.7 


30.5 


30.5 


30.5 


30.5 



Table m gives the PSNR resulte on Lena for different compression ratios for plain HVQ, 
unweighted HaarVQ, unweighted WHT HVQ and unweighted DCT HVQ. It can be seen from 
Table m ttiat the PSNR results of transform HVQ are the same as the plain HVQ results for the 
same compression ratio. Comparing the results of Table IE with Table H we find that the HVQ 
based schemes perform around 0.7 dB worse than the full search VQ schemes. 



Table m: PSNR Results of HVQs 


Compression 
Ratio 


HVQ 


HaarVQ 


WHTVQ 


DCTVQ 


2:1 


41.7 


41.7 


41.7 


41.7 


4:1 


35.3 


35.3 


35.3 


35.3 


8:1 


31.8 


31.8 


31.8 


31.8 


16:1 


29.7 


29.7 


29.7 


29.7 



Table IV gives the PSNR results on Lena for different compression ratios for full search 
plain VQ, percepmally weighted fiiU search Haar VQ, percepmally weighted full-search WHT VQ 
and percepmally weighted full search DCT VQ. The weighting increases the subjective quality of 
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the compressed images, though it reduces the PSNR. The subjective quality of the images 
compressed using weighted VQ's is much better than the unweighted VQ's. Table IV also gives 
the PSNR results on Lena for different compression ratios for perceptually weighted Haar VQ, 
WHT HVQ and DCT HVQ. The visual quality of the compressed images obtained using 
weighted transform HVQ's is significantly higher than for plain HVQ. The quality of the 
weighted transform VQ's compressed images is about the same as that of the weighted transform 
HVQ's compressed images. 



Table IV: PSNR results of Perceptually Weighted VQ's and HVQ 


!'s 


Compression 

Ratio 


Plain 

VQ 


Haar 

VQ 


WHT 

VQ 


Dcr 

VQ 


Haar 
HVQ 


WHT 
HVQ 


DCT 

HVQ 


2:1 


41.7 


39.4 


39.4 


39.4 


40.0 


40.0 


40.0 


4:1 


35.9 


35.1 


35.1 


35.1 


34.8 


34.8 


34.8 


8:1 


32.5 


31.8 


31.8 


31.9 


31.6 


31.6 


31.7 


16:1 


30.5 


29.9 


29.9 


30.0 


29.8 


29.8 


29.8 



Table V gives the encoding, times of the different algorithms on a SUN Sparc- 10 
workstation on Lena. It can be seen from Table V that the encoding times of the transform HVQ 
and plain HVQ are same. It takes 12 ms for the first stage encoding, 24 ms for the second stage 
encoding and so on. On the other hand JPEG requires 250 ms for encoding at all compression 
ratios. Thus the HVQ based encoders are 10-25 times faster ttian a JPEG encoder. The HVQ 
based encoders are also around 50-100 times faster than full search VQ based encoders. This low 
computational complexity of HVQ is very useful for collaborative video over heterogeneous 
networks. It makes 30 frames per second software only video encoding possible on general 
purpose workstations. 
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Table V: Encoding times in ms of differenl 


. algorithms 




Compression 
Ratio 


Trans- 
fomi 
HVQ 


irans- 
form VQ 


n V 


VQ 


JPEG 


2:1 


12 










4:1 


24 


900 


24 


800 


250 


8:1 


27 


900 


27 


800 


250 


16:1 


30 


900 


30 


800 


250 



Table VI gives the decoding times of different algorithms on a SUN Sparc- 10 workstation 
on Lena. It can be seen from Table VI that the decoding times of the transform HVQ, plain HVQ, 
plain VQ and transform VQ are same. It takes 13 ms for decoding a 2: 1 compressed image, 16 ms 
for decoding a 4: 1 compressed image and so on. On the other hand JPEG requires 200 ms for 
5 decoding at all compression ratios. Thus the HVQ based decoders are 20-40 times faster than a 
JPEG decoder. The decoding times of transform VQ are same as that of plain VQ as the 
transforms can be precomputed in tiie decoder tables. This low computational complexity of 
HVQ decoding again allows 30 frames per second video decoding in software. 



Table VI: Decoding times inms of different algorithms 




Compression 
Ratio 


Trans- 
form 
HVQ 


Trans- 
form VQ 


HVQ 


VQ 


JPEG 


2:1 


1.3 , 


13 


13 


13 


200 


4:1 


16 


16 


16 


16 


200 


8:1 


8.5 


8.5 


8.5 


8.5 


200 


16:1 


6.1 


6.1 


6.1 


6.1 


200 



10 Thepresentedtechniquesforthedesignof generic block transform based vector quantizer 

(WTHVQ) encoders implemented by only table lookups reduce the complexity of a full-search VQ 
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encoder. Perceptually significant distortion measures are incorporated into HVQ based on 
weighting the coefficients of arbitrary transforms. Essentially, the transforms are pre-computed 
and built into the encoder and decoder lookup tables. The perceptual advantages of transform 
coding are achieved while maintaining the computational simplicity of table lookup encoding and 
decoding. These algorithms have applications in multi-rate collaborative video environments. 
These algorithms (WTHVQ) are also amenable to eificient software and hardware solutions. The 
low computational complexity of WTHVQ allows 30 frames per second video encoding and 
decoding in software. 

Techniques for the design of generic constrained and recursive vector quantizer encoders 
implemented by table-lookups include entropy-constrained VQ, tree- stractured VQ, classified 
VQ, product VQ, mean-removed VQ, multi-stage VQ, hierarchical VQ, non-linear interpolative 
VQ, predictive VQ and weighted universal VQ. These different VQ structures can be combined 
with hierarchical table-lookup vector quantization using the algorithms presented below. 

Specifically considered are: entropy-constrained VQ to get a variable rate code and tree- 
stractured VQ to get an embedded code. In addition, classified VQ, product VQ, mean-removed 
VQ, multi-stage VQ, hierarchical VQ and non-linear interpolative VQ are considered to overcome 
the complexity problems of unconstrained VQ and thereby allow the use of higher vector 
dimensions and larger codebook sizes. Recursive vector quantizers such as predictive VQ achieve 
the performance of a memory-less VQ with a large codebook while using a much smaller 
codebook. Weighted universal VQ provide for multi-codebook systems. 

Perceptually weighted hierarchical table-lookup VQ can be combined with different con- 
strained and recursive VQ structures. At the heart of each of these structures, the HVQ encoder 
still consists of M stages of table lookups. The last stage differs for the different forms of VQ 
structures. 

Entropy-constrained vector quantization (ECVQ) , which minimizes the average 
distortion subject to a constraint on the entropy of the code>yords, can be used to obtain a variable- 
rate system. ECHVQ has the same stracture as HVQ, except that the last stage codebook and 
table arc variable-rate. The last stage codebook and table are designed using the ECVQ algorithm, 
in which an unconstrained minimization problem is solved: min(D+XH), where D is the average 
distortion (obtained by taking expected value of d defined above and H is the entropy. Thus this 
modified distortion measure is used in the design of the last stage codebook and table. The last 
stage table outputs a variable length index which is sent to the decoder. The decoder has a copy 
of the last stage codebook and uses the index for the last stage to output the corresponding 
codeword. 
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The design of an ECHVQ consists of two major steps. The first step designs VQ 
codebooks for each stage. Since each VQ stage has a different dimension and rate they are 
designed separately. As described above, a subjectively meaningful distortion measure is used 
for designing the codebooks. The codebooks for each stage except the last stage of the ECHVQ 
are designed independendy by the generalized Lloyd algorithm (GLA) run on the appropriate 
vector size of the training sequence. The last stage codebook is designed using the ECVQ 
algorithm. The second step in the design of ECHVQ builds lookup tables from the designed 
codebooks. After having built each codebook the corresponding code tables are built for each 
stage. All tables except the last stage table are buUt using the procedure described above. The last 
I stage table is designed using a modified distortion measure. In general the last stage table 
impiements the mapping 

= argmiiXi da^((i3,f.i(i?'-'),(j5;,,_i(i2^-')),^M(0) + ^M(») 

where Tji) is the number of bits representing the i* codeword in the last stage codebook. Only 
the last stage codebook and table need differ for different values of lambda. 

5 A tree-stracturedVQ at the last stage of HVQ can be used to obtain an embedded code. In 

ordinary VQ, the codewoKls lie in an unstructured codebook, and each input vector is m^>ped to 
the minimum distortion codeword. This induces a partition of the input space into Voronoi 
encoding regions. InTSVQ. on the other hand, the codewords are arranged m a tree structure, 
and each input vector is successively mapped (from the root node) to the minimum distortion child 
20 node. This induces a hierarchical partition, or refinement of the input space as tiie depth of the tree 
increases. Because of this successive refinement, an input vector mapping to a leaf node can be 
represented with high precision by the path map from the root to the leaf, or with lower precision 
by any prefix of the path. Thus TSVQ produces an embedded encoding of the data. If the depth 

of the tree is R and the vector dimension is k, tiien bit rates 0/k, 1/k, R/k, can all be 

25 adiieved. 

Variable-rate TSVQs can be constructed by varying the depth of the tiBe. This can be done 
by "greedily growing" the tree one node at a time (GGTSVQ), or by growing a large tree and 
pruning back to mininuze'its average distortion subject to a constraint on its average length 
(PTSVQ) or entropy (EPTSVQ). The last stage table outputs a fixed or variable length embedded 
30 index which is sent to tiie decoder. The decoder has a copy of the last stage tree-stmctured 
codebook and uses the index for tiie last stage to output tiie corresponding codeword. 

Thus TSHVQ has tiie same structure as HVQ except tiiat tiie last stage codebook and table 
are tree-structured. Thus in TSHVQ tiie last stage table outputs a fixed or variable length 
embedded index which is transmitted on tiie channel. Hie design of a TSHVQ again consists of 
35 two major steps. The first step designs VQ codebooks for each stage. The codebooks for each 
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Stage except the last stage of the TSHVQ are designed independently by the generalized Lloyd 
algorithm (GLA) run on the appropriate vector size of the training sequence. The second step in 
the design of TSHVQ builds lookup tables from the designed codebooks. After having built each 
codebook, the corresponding code tables are built for each stage. All tables except the last stage 
5 table are built using the procedure described above. The last stage table is designed by setting 

i^{il^'\i2 '^) to the variable length index i to which the concatenated vector ^yi.xiS^'\^uA^^^'^) is 

encoded by the tree structured codebook. 

In Classifled Hierarchical Table-Lookup VQ (CHVQ), a classifier is used to decide the 
class to which each input vector belongs. Each class has a set of HVQ tables designed based on 
1 0 codebooks for that class. The classifier can be a nearest neighbor classifier designed by GLA or 
an ad hoc edge classifier or any other type of classifier based on features of the vector, e.g. , mean 
and variance. The CHVQ encoder decides which class to use and sends the index for the class as 
side information. 

Traditionally, the advantage of classified VQ has been in reducing the encoding complexity 
15 of full-search VQ by using a smaller codebook for each class. Here the advantage with CHVQ is 
that bit allocation can be done to decide the rate for a class based on the semantic significance of 
that class. The encoder sends side-information to the decoder about the class for the input vector. 
The class determines which hierarchy of tables to use. The last stage table outputs a fixed or 
variable length index which is sent to the decoder. The decoder has a copy of the last stage 
20 codebook for the different classes and uses the index for the last stage to output the corresponding 
codeword from the class codebook based on the received classification information. 

Thus CHVQ has the same structure as HVQ except that each class has a separate set of 
HVQ tables. In CHVQ the last stage table outputs a fixed or variable (entropy-constrained 
CHVQ) length index which is sent to the decoder. The design of a CHVQ again consists of two 
25 major steps. The first step designs VQ codebooks for each stage for each class as for HVQ or 
ECHVQ. After having built each codebook the corresponding code tables are built for each stage 
for each class as in HVQ or ECHVQ. 

Product Hierarchical Table Lookup VQ reduces the storage complexity in coding a high 
dimensional vector by splitting the vector into two or more components and encode each split 
30 vector independently. For example, an 8x8 block can be encoded as four 4x4 blocks, each 
encoded using the same set of HVQ tables for a 4x4 block. In general, the input vector can be 
split into sub-vectors of varying dimension where each sub-vector will be encoded using the HVQ 
tables to the appropriate stage. The table and codebook design in this case is exactly the same as 
for HVQ. 
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^ Mean-Removed Hierarchical Table-Lookup VQ (MRHVQ) is a form of product code to 
.educe the encoding and decoding complexity. It allows coding higher dimensional vectors at 
higher rates. In MRHVQ, the input vector is split into two component features: a mean (scalar) 
and a residual (vector). MRHVQ is a mean-removed VQ in which the fidl search encoders 
replaced by table-lookups. In the MRHVQ encoder, the first stage table outputs an 8-b.t mdex for 
aiesidual and an 8-bit mean for a 2x1 block. The 8-bit index for the residual is used to mdex the 
second stage table. The output of the second stage table is used as input to the third stage T.^ 
8-bit means for several 2x1 blocks after the first stage aie further averaged and quantized for the 
input block and transmitted to the decoder independenfly of the residual index. The last stage table 
outputs a fixed or variable length (entropy-constrained MRHVQ) residual index which is sent to 
the decoder. Tire decoder has a copy of the last stage codebook and uses the index for the last 
stage to output the corresponding codeword from the codebook and adds the received mean of the 
block. 

MRHVQ has the same stmcture as the HVQ except that all codebooks and tables are 
designed for mean-removed vectors. The design of a MRHVQ again consists of two major steps. 
The first step designs VQ codebooks for each stage as for HVQ or ECHVQ on the iriean- 
«moved training set of the appropriate dimension. After having built each codebook the 
corresponding code tables are built for each stage as in HVQ or ECHVQ. 

Multi-stage Hierarchical Table-Lookup VQ (MSHVQ) is a form of product code which 
allows coding higher dimensional vectors at higher rates. MSHVQ is a multi-stage VQ in whrch 
the full search encoder is replaced by a table-lookup encoder. In MSHVQ, the encodmg is 
performed in several stages. In the first stage the input vector is coarsely quantized using a set of 
HVQ tables. The first stage index is transmitted as coarse-level information. In the second stage 
the residual between the input and the first stage quantized vector is again quantized usmg another 
set of HVQ tables. Note that the residual can be obtained through table-lookups at the second 
stage). The second stage index is sent as refinement infomiation to the decoder. This pr<K:e*.re 
continues in which the residual between successive stages is encoded using a new set of HVQ 
tables, -mere is a need for bit-allocation between the different stages of MSHVQ. The decod^ 
uses the transmitted indices to look up the corresponding codebooks and adds the reconstmcted 
vectors. 

MSHVQ has the same structure as the HVQ except that it has several stages of HVQ. In 
MSHVQeach stage outputsafixed or variable(entropy-constrained MSHVQ) length index which 
is sent to the decoder. The design of a MSHVQ consists of two major steps. The first stage 
encoder codebooks are designed as in HVQ. The second stage codebooks are designed closed 
i loop by using the residual between the training set and the quantized training set after the first 
stage. After having built each codebook the corresponding code tables are built for each stage 
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essentially as in HVQ or ECHVQ. The only difference is that the tables for the second and 
subsequent stages are designed for residual vectors. 



Hierarchical-Hierarchical Table-Lookup VQ (H-HVQ) again allows coding higher 
dimensional vectors at higher rates. H-HVQ is a hierarchical VQ in which the full search encoder 
5 is replaced by a table-lookup encoder. As in MSHVQ, the H-HVQ encoding is performed in 
several stages. In the first stage a large input vector (super-vector) is coarsely quantized using a 
set of HVQ tables to give a quantized feature vector. The first stage index is transmitted to the 
decoder. In the second stage the residual between the input and the first stage quantized vector is 
again quantized using another set of HVQ tables but the super- vector is split into smaller sub- 

1 0 vectors. Note that the residual can be obtained through table-lookups at the second stage. The 
second stage index is also sent to the decoder. This procedure of partitioning and quantizing the 
super- vector by encoding the successive residuals is repeated for each stage. There is a need for 
bit-allocation between the different stages of H-HVQ. The decoder uses the transmitted indices to 
look up the corresponding codebooks and adds the reconstructed vectors. The structure of H- 

1 5 HVQ encoder is similar to that of MSHVQ except that in this case the vector dimensions at the first 
stage and subsequent stages of encoding differ. The design of a H-HVQ is same as that of 
MSHVQ with the only difference is that the vector dimension reduces in subsequent stages. 

Non-linear Interpolative Table-Lookup VQ (NIHVQ) allows a reduction in encoding and 
storage complexity compaied to HVQ. NIHVQ is a non-linear interpolative VQ in which the fiiU- 
20 search encoder is replaced by a table-lookup encoder. In NIHVQ. the encoding is performed as in 
HVQ, except that a feature vector is extracted from the original input vector and the encoding is 
performed on the reduced dimension feature vector. The last stage table outputs a fixed or variable 
length (entropy-constrained NIHVQ) index which is sent to the decoder. The decoder has a copy 
of die last stage codebook and uses the index for the last stage to output the corresponding 
^ 25 codeword. The decoder codebook has the optimal non-linear interpolated codewords of the 
dimension of the input vector. 

The design of a NIHVQ consists of two major steps. The first step designs encoder VQ 
codebooks from the feature vector for each stage as for HVQ or ECHVQ. The last stage 
codebook is designed using nonlinear interpolative VQ. After having built each codebook the 
30 corresponding code tables are built for each stage for each class as in HVQ or ECHVQ. 

Predictive Hierarchical Table-Lookup VQ (PHVQ) is a VQ witii memory. The only 
difference between PHVQ and predictive VQ (PVQ) is that the full search encoder is replaced by a 
hierarchical arrangement of table-lookups. PHVQ takes advantage of the inter-block correlation in 
images. PHVQ achieves the performance of a memory-less VQ with a large codebook while 
35 using a much smaller codebook. In PHVQ, tiie current block is predicted based on the previously 
quantized neighboring blocks using linear prediction and the residual between the current block 
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and its prediction is coded using HVQ. The prediction can also performed using table-lookups 
and the quantized predicted block is used for calculating the residual again through table-lookups. 
The last stage table outputs a fixed or variable length index for the residual which is sent to the 
decoder. The decoder has a copy of the last stage codebook and uses the index for the last stage to 
5 output the corresponding codeword from the codebook. The decoder also predicts the current 
block from the neighboring blocks using table-lookups and adds the received residual to the 
predicted block. 

In PHVQ, all codebooks and tables are designed for the residual vectors. In PHVQ, the 
last stage table outputs a fixed or variable (entropy-constrained PHVQ) length index which is sent 
10 to the decoder. The design of a PHVQ consists of two major steps. The first step designs VQ 
codebooks for each stage as for HVQ or ECHVQ on the residual training set of the appropriate 
dimension (closed-loop codebook design). After having built each codebook the corresponding 
code tables are built for each stage as in HVQ or ECHVQ. the only difference is that the residual 'SJ 
can be calculated in the first stage table. 

1 5 Weighted Universal Hierarchical Table-Lookup VQ (WUHVQ) is a multiple-codebook VQ 

system in which a super-vector is encoded using a set of HVQ tables and the one which minimize 
the distortion is chosen to encode all vectors within the super-vector. Side-information is sent to 
inform the decoder about which codebook to use. WUHVQ is a weighted universal VQ (WUVCJ) 
in which the selection of codebook for each super-vector and the encoding of each vector within 

20 the super-vector is done through table-lookups. The last stage table outputs a fixed or variable 
length (entropy-constrained WUHVQ) index which is sent to the decoder. The decoder has a copy 
of the last stage codebook for the different tables and uses the index for the last stage to output the 
corresponding codeword from the selected codebook based on the received side- information. 

WUHVQ has multiple sets of HVQ tables. The design of a WUHVQ again consists of ^ 
25 two major steps. The first step designs WUVQ codebooks for each stage as for HVQ or ECHVQ. 
After having built each codebook the corresponding HVQ tables arc built for each stage for each 
set of HVQ tables as in HVQ or ECHVQ. 

Simulation results have been obtained for the diffraent HVQ algorithms. FIGS. 4-8 show 
thePSNR (peak signal-noise-ratio) results on the 8-bit monochrome image Lena (512x512) as a 
30 function of bit- rate for the different algorithms. The codebooks for the VQs have been generated 
by training on 10 different images. PSNR results are given for unweighted VQs; weighting 
reduces the PSNR though the subjective quality of compressed images improves significantly. 
One should however note that there is about 2 dB equivalent gain in PSNR by using a subjective 
distortion measure. 
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FIG. 4 gives the PSNR results on Lena for greedily-grown-then pruned, variable-rate, 
tree-structured hierarchical vector quantization (VRTSHVQ). The results are for 4x4 blocks 
where the last stage is tree-structured, VRTSHVQ gives an embedded code at the last stage. 
VRTSHVQ again gains over HVQ. There is again about 0.5-0.7 dB loss compared to non- 
5 hierarchical variable-rate tree-structured table-based vector quantization (VRTSVQ). 

HG. 5 gives the PSNR results on Lena for different bit-rates for plain VQ and plain HVQ. 
The results are on 4x4 blocks. We find that the HVQ performs around 0.5-0.7 dB worse than the 
full search VQ. FIG. 4 also gives the PSNR results on Lena for entropy-constrained HVQ 
(ECHVQ) with 256 codewords at the last stage. The results are on 4x4 blocks where the first 
1 0 three stages of ECHVQ are fixed-rate and the last stage is variable rate. It can be seen that 
ECHVQ gains around 1.5 dB over HVQ. There is however again a 0.5-0.7 dB loss compared to 
ECVQ. 

Classified HVQ performs slightly worse than HVQ in rate-distortion but has the advantage 
of lower complexity (encoding and storage) by using smaller codebooks for each class. Product 
1 5 HVQ again performs worse in rate-distortion complexity compared to HVQ but has much lower 
encoding and storage complexity compared to HVQ as it partitions the input vector into smaller 
sub-vectors and encodes each one of them using a smaller set of HVQ tables. Mean-removed 
HVQ (MRHVQ) again performs worse in rate-distortion compared to HVQ but allows coding 
higher dunensional vectors at higher rates using the HVQ stracturc, 

20 FIG. 6 gives the PSNR results on Lena for hierarchical-HVQ (H-HVQ). The results are 

for 2-stage H-HVQ. The first stage operates on 8x8 blocks and is coded using HVQ to 8 bits. In 
the second stage the residual is coded again using another set of HVQ tables. Figure 1 1 shows 
the results at different stages of the second-stage H-HVQ (each stage is coded to 8 bits). Fixed- 
rate H-HVQ gains around 0,5-1 dB over fixed- rate HVQ at most rates. Multi-stage HVQ 

25 (MSHVQ) is identical to H-HVQ where the second stage is coded to the original block size. Thus 
the performance of MSHVQ can also be seen from Figure 11. There is again about 0.5-0.7 dB 
loss compared to full search Shoham-Gersho HVQ results. 

FIG. 7 gives the, PSNR results on Lena for entropy-constrained predictive HVQ 
(ECPHVQ) with 256 codewords at the last stage. The results are on 4x4 blocks where the first 
30 three stages of ECPHVQ are fixed-rate and the last stage is variable rate. It can be seen that 
ECPHVQ gains around 2.5 dB over fixed-rate HVQ and 1 dB over ECHVQ. There is however 
again a 0.5-0.7 dB loss compared to ECPVQ. 

FIG. 8 gives the PSNR results for entropy-constrained weighted-universal HVQ 
(ECWUHVQ). The super-vector is 16x16 blocks for these simulations and the smaller blocks 
35 are 4x4. There are 64 codebooks each with 256 4x4 codewords. It can be seen that ECWUHVQ 
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gains around 3 dB over fixed-rate HVQ and L5 dB over ECHVQ. 
0.7 dB loss compared to WUVQ. 
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There is however again a 0,5- 



The encoding times of the transform HVQ and plain HVQ are same. It takes 12 ms for the 
first stage encoding, 24 ms for the first two stages and 30 ms for the first four stages of encoding 
5 a 512x512 image on a Sparc- 10 Workstation. On the other hand JPEG requires 230 ms for 
encoding at similar compression ratios. The encoding complexity of constrained and recursive 
HVQs increases by a factor of 2-8 compared to plain HVQ. The HVQ based encoders are around 
50-100 times faster ttian their corresponding full search VQ encoders. 

Similarly the decoding times of the transform HVQ, plain HVQ, plain VQ and transform 
10 VQ are same. It takes 13 ms for decoding a 2:1 compressed image, 16 ms for decoding a 4:1 
compressed image and 6 ms for decoding a 16:1 compressed 512x512 image on a Sparc- 10 
Workstation. On the other hand JPEG requires 200 ms for decoding at similar compression 
ratios. The decoding complexity of constrained and recursive HVQs does not increase much 
compared to that of HVQ. Thus the HVQ based decoders are around 20-30 times faster than a 
15 JPEG decoder. The decoding times of transform VQs are same as that of plain VQs as the 
transforms can be precomputed in the decoder tables. 

Thus to summarize, constrained and recursive HVQ stmctures overcome the problems of 
fixed-rate memory-less VQ. The main advantage of these algorithms is very low computational 
complexity compared to the corresponding VQ structures. Entropy-constrained HVQ gives a 

20 variable rate code and performs better than HVQ. Tree-structured HVQ gives an embedded code 
and performs better than HVQ. Classified HVQ, product HVQ, mean-removed HVQ, multi-stage 
HVQ, hierarchical HVQ and non- linear interpolative HVQ overcome the complexity problems of 
unconstrained VQ and allow the use of higher vector dimensions and achieve higher rates. 
Predictive HVQ achieves the performance of a memory-less VQ with a large codebook while 

25 using a much smaller codebook. It provides better rate-distortion performance by taking advantage 
of inter-vector correlation. Weighted universal HVQ again gains significantly over HVQ in rate- 
distortion. Further some of these algorithms (e.g. PHVQ, WUHVQ) with subjective distortion 

measures perform better or comparable to JPEG in rate-distortion at a lower decoding complexity. 

f ^ 

As indicated above, constrained and recursive vector quantizer encoders implemented by 
30 table-lookups. These vector quantizers include entropy constrained VQ, tree-structured VQ, 
classified VQ, product VQ, mean-removed VQ, multi-stage VQ, hierarchical VQ, non-linear 
interpolative VQ, predictive VQ and weighted-universal VQ. Our algorithms combine these 
different VQ structures with hierarchical table-lookup vector quantization. This combination 
significantly reduces the complexity of the original VQ structures. We have also incorporated 
35 perceptually significant distortion measures into HVQ based on weighting the coefficients of 
arbitrary transforms. Essentially, the transforms are pre-computed and built into the encoder and 
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decoder lookup tables. Thus we gain the perceptual advantages of transform coding while 
maintaining die computational simplicity of table-lookup encoding and decoding. These and other 
modifications to and variations upon the preferred embodiments are provided for by the present 
invention, the scope of which is defined by the following claims. 
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CLAIMS 

What is claimed is: 

1 . A data compression system comprising: 

a vectorizer for converting said data into a series of data vectors selected from a set of distinct 
vectors; and 

a lookup table that maps said distinct vectors to a set of embedded codes so that one of said 
codes is generated in response to each of said data vectors, said loolcup table being coupled to said 
vectorizer for receiving said data vectors therefrom. 

2. A data compression system as recited in Claim 1 wherein said set of embedded codes 
include codes of different lengths. 

3. A data compression system as recited in Claim 1 wherein said set of embedded codes 
includes plural subsets of said codes, no two of said subsets having the same number of said 
codes, said loolcup table having plural outputs, said lookup table outputting a code from each 
subset for each of said image vectors from a respective one of said pural outputs. 

4. In a method of designing a data compression system, the steps of: 

designing a tree structured codebook for a final stage of a lookup table so as to determine a 
final codebook containing a set of final codebook vectors, a set of final indexes being mapped to 
said set of final codebook vectors; and 

filling in said lookup table using a set of combinations of input vectors as table addresses, 
assigning said final indices to said addresses on the basis of the proximities of each of said 
combinations of input vectors to each of said final codebook vectors. 

5. In a method as recited in Claim 4 wherein said final codebook has a tree structure that is 
shaped according to a joint distortion and entropy criterion. 

6. In a method as recited in Claim 4 wherein: 

said designing step involves greedily growing and then pruning a tree-stmcted codebook 
node-by-node subject to a joint constraint of distortion and entropy so as to define a series of trees 
in which each tree having a predecessor tree differs from its predecessor tree in having two 
additional nodes, each of said trees having a respective codebook of codebook vectors 
corresponding to childless nodes of the respective tree, each of said codebooks having a respective 
set of embedded codes including a first set of embedded codes and a second set of embedded 
codes; and 

said filling in step involves assigning a code from said first set and a code frx)m said second set 
to each of said combinations of input vectors. 
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7. In a computer method of designing a data compression table, a step of selecting codebook 
vectors to minimize a distortion measure in a block transform domain. 



8. In a method as recited in Qaim 7 wherein said distortion measure is perceptually weighted. 

9. In a method as recited in Claim 8 further comprising the steps of: 
5 assigning indices to said codebook vectors; and 

assigning said indices to combinations of image vector inputs on the basis of proximity of each 
of said combinations to said codebook vectors, said proximity being determined according to a 
proximity measure that is less weighted than said distortion measure. 

10. In an image compression system, a data compression table designed in accordance with 
10 the method of Claim 7. 

11. In an image compression system, a data comrpession table designed in accordance with 
the method of Claim 9. 

12. An image compression system comprising: 

a vectorizer for converting an image into image vectors, each image vector representing a 
1 5 respective one of mutual exclusive of blocks of pixels of said image; and 

table means for mapping each of said image vectors to an index as a function of a proximity 
measure of a block transform of the image vector to a codebook vector. 

13. In a computer method of designing a data compression table, a step of designing 
codebook vectors using a codebook design procedure for structured vector quantization. 

20 14. In a image compression system, a vector quantization table designed in accordance with 

the method of Claim 13. 

15. In a computer method of designing a vector quantization table, a step of designing a 
codebook using a joint measure of entropy and distortion. 

16. In an image compression system, a vector quantization table designed in accordance with 
25 the method of Claim 15^ ^ 

17. An image compression system comprising: 

a vectorizer for converting an image into vectors; and 

a vector quantization table that outputs a variable length code that provides a higher bit rate for 
a given distortion measure than any fixed length code. 

30 18. In a method of designing an image compression table, the steps of: 

designing a codebook using a weighted distortion measure; and 
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assigning table inputs to said codebook using a proximity measure that is less weighted than 
said distortion measure^ 

19. In an image compression system, an image compression table designed using the method 
of Claim 17. 

5 20, In a computer method of designing a hierarchical data compression table, a step of 

designing a codebook for a final stage table jointly using a constraint other than minimizing 
distortion in construction with a constraint to minimize distortion. 

21. An image compression system including a hierarchical data compression table designed in 
accordance with the method of Claim 20. 

10 22. In a computer method as recited in Qaim 21, wherein said constraint is not used for 

designing codebooks for preliminary stage tables. 

23. An image compression system including a hieraiiiical vector quantization table in which 
the codebook of the final stage table minimizes a joint measure including distortion and another 
constraint. 

15 24. An image compression system as recited in Claim 23 including preliminary stage 

codebooks that minimize distortion without being subject to an additional constraint 

25. An image compression system comprising: 

vectorizing means for converting an image into image vectors; and 

table means for converting said vectors into embedded codes, said table means having address 
20 input means for receiving said image vectors, said address input means be coupled to said 

vectorizing means. p| 

26. An image compression system as recited in Claim 24 wherein said table means includes 
multiple stages, said multiple stages including a first stage having inputs coupled to said 
vectorizing means for receiving said vectors, said multiple stages including a final stage for 

25 ouQ)utting said embedded codes, each of said multiple stages other than said first stage having 
inputs for receiving outputs from the preceding one of said multiple stages. 
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27. An image compression system comprising: 

vectorizer means for converting a digital image into a series of image vectors, each of said 
image vectors being selected from a set of image vectors defined in a pixel space, each of said 
image vectors in said series representing a respective one of a set of mutually exclusive blocks that 
collectively constitute said digital image; and 

table means for moping many-to-one said image vectors of said set many-to-one to indices 
from a set of indices, each of said indices being decodable to yield a vector in a non-pixel space. 
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